2018/10/31
It's recommended to read on Gitbook
What is a good design for a relational database (RDB)?
You might have seen an excel data sheet filled with lots of columns. I can imagine that kind of excel sheet is very messy. A well designed database should design and create a model, and then implement the model into database.
For RDB, we use ER model. However, we might still design a model with deflects. For example, a table combines student information and department information.
In this case, DNUM
is followed by SID
, but DNAME
and D_HEAD
are followed by DNUM
. That's dangerous, because it might cause anomalies while inserting, deleting, or modifying a tuple of data.
This is an example of bad designs. So, I am going to introduce to 5 concept of normal forms.
When we create a RDB, we should not only base on ER model, but also need to consider that if the database are normal forms. Then we will have a good design database.
Concept of normal forms were first proposed by Codd in 1972.
The value of an attribute should be an atomic value. It could not be multi-value, array, composite values, or any other relation. (ER model has already defined)
Primary key could be a combination of keys. If a key of the primary key, could determine an attributes in the table, it is a partial relationship between the primary key and non-prime attributes. Then, it violates the second normal form.
In this case, the primary key is SID
and PID
, and any attributes should be determined by the two. But, SNAME
is only determined by SID
, so it violates the rule. Therefore, this table should divide in three new tables to follow the concept.
If there are attributes that follow a non-primary key, these attributes should be an another table. Just as the example:
A table is in 4NF if and only if, for every one of its non-trivial multi-valued dependencies X ↠ Y, X is a superkey. That is, X is either a candidate key or a superset thereof.
Removing “bad” multi-valued dependencies help us get into 4th normal.
form, and into better design.
Considering the following example:
If Pizza Variety
and Delivery Area
are independent to Restaurant
, it violates 4NF. Because, if restaurant Pizza A1
add a new kind pizza Cheese Pizza
, it need to add three rows for the there locations that in the table.
In other words, adding a new kind of pizza, the table need to insert rows, which is the new pizza with each area, and it is easy to make errors when updating. To eliminate the possibility of these anomalies, 4NF suggests that the table should be separated.
Assume the table meets 1NF to 4NF.
Considering a table that has Traveling Salesman
(primary key), Brand
and Product
.
Imagining a extreme case:
A Traveling Salesman
has certain Brand
s and certain Product
Types in their repertoire. If Brand
B1 and Brand
B2 are in their repertoire, and Product
Type P is in their repertoire, then (assuming Brand
B1 and Brand
B2 both make Product Type P,) the Traveling Salesman
must offer products of Product Type P those made by Brand B1 and those made by Brand B2.
That is to say, the Brand
and Product
are combined (4NF), the Salesman
is not able to only sale a certain Brand
but exclude one of products of that Brand
.
Then to solve this, splitting the able to three would make sense.
Traveling Salesman
with Brand
Traveling Salesman
with Product
Brand
with Product
ER model is good for RDB, but the database might easily makes error if we design not well.
Following these concept of normal forms would reduce the possibility of anomalies, which makes the database more clear and reliable.
Quick Link
1.StellarSQL Repository
2.Gitbook of this seriesAuthor
Liu, An-Chi (劉安齊). A software engineer, who loves writing code and promoting CS to people. Welcome to follow me at Facebook Page. More information on Personal Site and Github.